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Abstract 

Protein folding is analyzed using a replica variational formalism to investigate 
some free energy landscape characteristics relevant for dynamics. A random 
contact interaction model that satisfies the minimum frustration principle is 
used to describe the coil-globule transition (characterized by Tqq), glass tran- 
sitions (by Ta and Tk) and folding transition (by Tp). Trapping on the free 
energy landscape is characterized by two characteristic temperatures, one dy- 
namic, Ta the other static, Tk (Ta > Tk), which are similar to those found 
in mean field theories of the Potts glass. l)Above Ta, the free energy land- 
scape is monotonous and polymer is melted both dynamically and statically. 
2) Between Ta and Tk, the melted phase is still dominant thermodynamically, 
but frozen metastable states, exponentially large in number, appear. 3)A 
few lowest minima become thermodynamically dominant below Tk, where 
the polymer is totally frozen. In the temperature range between Tv and Tk, 
barriers between metastable states are shown to grow with decreasing tem- 
perature suggesting super-Arrhenius behavior in a sufficiently large system. 
Due to evolutionary constraints on fast folding, the folding temperature Tp is 
expected to be higher than Tk, but may or may not be higher than Ta. Di- 
verse scenarios of the folding kinetics are discussed based on phase diagrams 
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that take into account the dynamical transition, as well as the static ones. 



Typeset using REVT^X 



I. INTRODUCTION 



In recent years the problem of protein folding, how a biological molecule spontaneously 
organizes itself under appropriate thermodynamic conditions, has become a fertile field of 
investigation for statistical physics The conceptual difficulty of finding the global 

free energy minimum, or native structure, reliably in a short amount of time, the so-called 
Levinthal's paradox 0] has come to be understood as being related to the problem of broken 
ergodicity in glassy systems M . In the modern version of the paradox, however, it is not the 
size of the configurational search alone that is relevant but rather the topography of the free 
energy landscape. The size of the free energy barriers between the metastable states of finite 
size heteropolymer determines the local rate of exploration of the free energy landscape. In 
addition, the global topography, in particular, whether there is an energetic bias funneling 
the molecule forwards a native structure is also important to understand the folding rate. 

The earliest analytical approach to the problem captured these two aspects of the problem 
- the multiple minima problem and the guiding forces with the simplest description of the 
free energy landscape The ruggedness of the free energy surface was modeled by the 

random energy model (REM) |10[ . The REM is the simplest model of a system which like a 
spin glass is frustrated through the conflict of many competing randomly chosen interactions. 
A sufficiently large system with this free energy landscape was shown to possess a Levinthal 
paradox in its folding. More precisely, at a characteristic glass transition temperature Tk, 
while the system may thermodynamically prefer to be in unique configuration, the time to 
search for it would scale exponentially in the system size. ('K' of Tk is in honor of Kauzmann 



who attracted notice to the entropy crisis as the origin of the glass transition |TTJ. See 
below for more details.) Proteins are finite, however, so it is a quantitative issue whether 
such a system can fold on relevant biological time scales. Buttressed by this asymptotic 
argument, but also calling upon observed regularities in protein structure, Bryngelson and 
Wolynes argued that most proteins are not random but additionally satisfy a principle of 
minimal frustration, so that conflicts in attempting to satisfy individual interactions, are 
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less than expected allowing a transition to a unique configuration at a folding temperature 
Tp higher than Tk- The coherent part of the interactions could be taken account in the 
statics by introducing a conventional order parameter for folding, as in mean field theory. 
For a small system this order parameter can also act as an approximate global reaction 
coordinate for describing the self-organization process ||. This relatively simple framework 
can be elaborated to take into account additional order parameters for folding, such as local 
secondary structure formation |12] and recently correlations in the free energy landscape [|13 



The framework and the resulting mechanistic scenarios are also quite useful for organizing 
the discussions of many experiments []J . 

Another significant thread in the statistical physics of protein folding has been provided 
by theories that use the replica technology of spin glass theory || along with polymer 
physics to understand the free energy landscape [|TJ-[19|. Garel and Orland jJTJj], as well as 



Shakhnovich and Gutin [TIJ studied random heteropolymers using the traditional polymeric 
virial expansion Hamiltonian of a connected chain incorporating a Gaussian random pair 
interaction. These workers showed the connection of the random heteropolymer thermo- 



dynamics with the phase transition of Potts glass [30]. Qualitatively this was not entirely 
unexpected because a wide range of frustrated random systems without special symmetries 
fall in this universality class, which also includes the REM model |]20|-p3|. This work was 
relevant to the ruggedness issues but not to the problem of guiding forces. Soon after this 
work, Sasai and Wolynes dealt with three aspects, polymeric interaction, ruggedness of free 



energy landscape, and results of evolution, in one model |Tj| . They employed a variational 



approach modeled on Feynman's polaron theory ||24[ in the replica space to analyze the asso 



dative memory Hamiltonian p5[ . l)This model has explicit chain connectivity. 2)The target 
structure, i.e., the native structure, included in the memory set (or data base) provides a 
route to incorporate the role of principle of minimal frustration, while 3)memories other than 
the target induce ruggedness in the free energy landscape. Very recently, Ramanathan and 
Shakhnovich shed light on effects of the evolutionary constraint of minimal frustration in 
more detail [ l7| . Instead of assuming the pronounced energy gap a priori, they represented 



evolution as a process that yields sequences distributed according to a Boltzmann distribu- 
tion for a fixed target structure. Their theory shows that it is possible to have an energy gap 
large enough to stabilize the native structure only by choosing the sequence appropriately 
although it is not clear if nature actually used such a sequence selection mechanism or not. 
An alternative route to minimal frustration called 'imprinting' has also been discussed by 
Pande et al. [pL8| , which finally gives almost same result as [17 |. 



Levinthal's paradox makes stark that the conceptual issues of the folding problem revolve 
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on kinetics in at least a semi- quantitative fashion. Theory and many simulations [(26| ,f27 
concurrence suggest that real proteins fold below their folding temperature Tp but somewhat 
above the (static) glass phase transition temperature T K . Thus, to understand the kinetics 
of folding, a microscopic description of the free energy landscape above Tk is indispensable. 
Is the effective free energy landscape monotonous and smooth above Tk? We claim no. 
Even above Tk there are a number of local minima lasting many vibrational periods (Rouse 
relaxation times) in the free energy landscape. Although the variational solution corre- 
sponding with the melt phase dominates the formal Boltzmann average, actually a protein 
is dynamically trapped and feels some of the ruggedness of the free energy landscape and 
thus kinetics would strongly be affected by the presence of local minima. Then the next 
question that arises concerns the barrier heights between these local minima because these 
barrier heights determine the kinetics. We show that barrier heights grow with decreasing 
temperature until T K is reached, which directly leads us to the super-Arrhenius activation 
behavior in this temperature regime. 

To make this analysis we utilize recently developed ideas in the spin glass theory, espe- 
cially for the Potts-type spin glass P^j23|j28|j29[| . In a series of papers Kirkpatrick, Thiru- 
malai, and Wolynes, working on models of structural glasses |2T|, p— spin interaction model 
glassesQo > 2) |22j and the Potts glasses with more than 4 components [^J made the fol- 
lowing observations: l)The phase transition temperature Ta obtained by the dynamical 
theory, i.e., mode-mode coupling theory based on Langevin dynamics, is higher than that 
Tk obtained by the static theory, i.e., the ordinary replica method. 2) As temperature de- 



creases starting from the paramagnetic phase, solutions of the Thouless- Anderson- Palmer 
(TAP) equations |JD| except paramagnetic one appear exactly at Ta (See Fig.[l|). 3)For 



Ta > T > Tk, many metastable states are separated by high barriers and therefore have 
a long lifetime. Thus activated transport is the typical picture in this range ('A' of Ta 
means 'activation'). 3)The overlap order parameter, q, in the same group of 1 level replica 
symmetry breaking (RSB) takes a discontinuous jump at Tk, which reminds us of a first 
order phase transition in the order parameter, but the transition looks a second order in 
that thermodynamically, there is no latent heat. (This was known and well understood in 
the case of REM.) They called this class of phase transitions, random first order phase tran- 
sitions. Crisanti and Sommers found essentially the same behavior in the p-spin spherical 



model |28| , which buttresses the case that this type of behavior, very different from that of 
the Sherrington-Kirkpatrick model, is quite universal for systems without inversion symme- 
try. Using the p-spin spherical model Kurchan, Parisi and Virasoro succeeded in describing 
the metastable states in greater detail and the barriers between them in the replica formal- 
which we use in this paper. This formalism for describing metastable states has 
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some forbidding aspects. Like the equilibrium replica technique there are steps involving 
analytical continuation to apparently nonphysical values of replica number. More work to 
clarify the techniques would be welcome but the physical content seems very much in keep- 
ing with a transition is driven by configurational entropy. Barrier heights are determined by 
a competition between the number of available states and the energetic advantage which a 
polymer can achieve in a particular lower minimum. The results on barrier heights are the 
main focus of this paper. 

In this paper we employ the contact interaction model used in |15j with the principle 



of minimal frustration implemented at the level of Methodologically, we rely on the 



replica variational approach of Sasai and Wolynes, but extend the interpretation to the level 



of Kurchan, Parisi, and Virasoro |29j for metastable states and barriers. These methods are 
summarized in section II. In section III, we introduce some approximations so that we can 
derive expressions for the free energy in as simple form as possible. These expressions are 
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used in section IV to locate the phase transitions between different phases. We derive explicit 
expressions for four phase transition temperatures, the coil-globule transition temperature 
Tcg, the folding temperature Tp, the dynamical glass temperature Ta, and the static glass 
temperature Tk. In particular, the ruggedness of free energy landscape is characterized by 
two critical temperatures of freezing, Ta and Tk as is in the case of Potts glass. In section V 
we draw phase diagrams with fairly diverse states and discuss several scenarios of the folding 
kinetics, which can be thought as a refined version of the scenarios given in [y. Complete 
but somewhat messy expressions for the free energy are given in the appendix. 



II. REPLICA VARIATIONAL APPROACH 



A. Model 



The model we present here, while different from that used by Sasai and Wolynes [|T6 
is motivated by it. Our main goal in this section is to show how a model with short range 
(in space) interaction can be treated with the same formalism as the long range associative 
memory model. 

As a simple model of protein, we start with a standard beads type Hamiltonian which 
includes the interaction between monomers in the form of the virial expansion. 

H = k B Tj2 ^ ~ Vl? + ~ E Mfo + c^- £ 5(r, - r,)<%, - r fc ), (1) 
where r« represents a carbon of each amino acid (i = 1 ~ N), a is the Kuhn length 



3l|| , v represents finite resolution of space (see below), fry and c are the second and third 
virial coefficients, respectively. Depending on the type of amino acids individual bij have 
apparently random values, whose distribution will be given below. We assume the spatial 
resolution is v 1//3 and so any function is smeared out within this scale. Therefore, 6(0) = v^ 1 . 
The above Hamiltonian itself is directly suitable to the random heteropolymer, as was used 
in p . 



7 



Since a protein can fold because of its specific sequence, it is indispensable to incorporate 
the principle of minimal frustration, as was mentioned in the Introduction. The key idea 
here is that the energy of ground state, which corresponds to the target structure defined by 
amino acid positions {r^} of the native state, depends strongly on the specific sequence of 
amino acids, while properties of non-native structures can well be modeled by the random 
interaction between amino acids. In other words, the energy of native structure is non- 
self averaging, while most others that are structurally unrelated are self averaging. This 



is supported by numerical enumeration of all the compact states in the lattice 27-mer |27 
Using a measure of nativeness, 



(2) 



we rewrite the above Hamiltonian separating the non-self-averaging part from the others, 



where 



H = k B TY, +(l"ff)|E M(* - 

i i^j 
V 2 

+ (1 - q)c— <K r * ~ r M r J - r k) + qE T , 



^EM( r i- r i) + c \ E s ( r i- r j) s ( r 3 



(3) 



(4) 



Here we introduce an approximation, 



(5) 



which is exact either when the system is in the native structure or when the system is totally 
uncorrelated to the native structure. In eq.(|3|), the second and third terms are assumed to 
self-averaging, while the last term is non-self-averaging. After the non-self-averaging term 
representing minimal frustration of the target structure is taken into account, the interaction 
energies 6^ in eq. (|3p may be modeled, as was mentioned, by Gaussian random variables with 
probability distribution, 



(2tt6 : 



2N-1/2 



exp 



-(fey - b ) 2 /2b 2 



(6) 



Note that we do not take an average of bij in eq.(|5|), which are thought as sequence specific. 
Equations (|3|), ([5]), and @ defines the model, in which parameters, T, b , b, and E T play 
central roles. 

Here, we should bear in mind that the virial expansion is, as is well-known, good for 
extended states such as the random coil state but not very accurate for highly collapsed 
state, which we are mainly interested in. Thus, the present thermodynamic description of 
the radius of polymer, in particular, may not be particularly accurate. 



B. Replica variational formalism and mean field approximation 

We summarize the variational polaron approach in replica space used earlier jlbj. We 



calculate the free energy [F] av = — k B T[\n Z\ w averaged over the random bond interaction 
b^ with probability distribution eq.(f|), where [ ] av means the average over b^ and Z is the 
canonical partition function. To avoid the difficulty of taking an average of In Z, the replica 
trick |J utilizes a mathematical identity, lnx = lim n ^o(^ n — ^)/ n - Thus, 

\Z n ] — 1 

- (3 [F] av = [lnZ] av = lim . (7) 

n >vj f i 

We then concentrate on [Z n ] av ? which is explicitly given as 

[Z\ v = /n [dM>(M / ft Prfe-^^(W}), ( 8 ) 



where 



i>j a=l 



(9) 



The delta function in the above equation is used to fix the center of mass at the origin. 
Since the integrand in eq.(H) is a Gaussian function with respect to b^, we can integrate b^ 
out at the beginning to get 

[Z n L = I T[Vr?e-^. (10) 

J a 

The effective Hamiltonian here is of the form 



H cS = H + H 1 + H 2 , (11) 

each term of which is given as 



Ho = k B Tj2 yi+ \„, iJ , (12) 



2a 2 

a,i 



#i = E^ T + E^ 

a a ^ 



(3b 2 

&o(l ~ ?«) ~ —(1 ^ **) 2 



E W - r i ) 



+ 4E(1- <?«) E <K r ? - rj)5(r° - r£) (13) 

a i^fc 



and 



#2 = E (1 - - 9*) E W - r i )*(rf - *f). (I 4 ) 

ifo maintains the polymeric chain connectivity, H\ includes the one-replica part, and H2 
represents the inter-replica interaction. Obviously, the latter term is the driving force of the 
RSB. 

Integration over the vast configuration space in eq.([H]) is too complicated to execute 



exactly. So, we generalize the variational principle well-known in the statistical physics [34 



to replica space; For any reference Hamiltonian if re f ({rf }), we have an inequality relation, 

-^var = F rc { + (H c fi — H rc {) > F c g, (15) 

where 

-(3F Tei = lnZ ref = In / \{Vv^ H ^ , 

— /?F eff = ln[Z n ] av and (• • •) means an expectation value for the Hamiltonian H ref . This 
inequality holds before we take a limit n — > 0. Using this principle we optimize _F var with 
respect to order parameters included in the reference Hamiltonian. With the optimized F* ar , 
we get an estimate of the free energy we are seeking, [F] ay = lim ra ^ F eff /n ~ lim n _^o ^4r/ n - 
Reference trial functions need to be simple enough to lead to soluble partition func- 
tion but flexible enough to include order parameters which characterizes all relevant phase 
transitions. The coil-globule transition, the folding transition, the glass transition, can be 
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characterized by the radius of gyration, by a fluctuation scale around the native structure, 
and by an inter-replica correlation (Debye- Waller factor in the glass phase), respectively A 
natural choice for a reference Hamiltonian is 

PH iei = Aj:(r? +1 -r«y + Bj:(r?) 2 + Cj:(r?-rI) 2 + D £ d aP (r<* - if ) 2 , (16) 

a,i ct,i a,i ce^(3,i 

where A = (2a 2 ) -1 and all B, C, D, and d a p are free parameters to be optimized based on 
the variational principle eq. ([To]) . Once these parameters are optimized, they play the role of 
the global order parameters; B, C, D, and d a p represents the radius of gyration, fluctuation 
around the native structure, inter- replica correlation, and the mode of RSB, respectively. 

As for the mode of RSB, we rely on analogy to the Potts glass with components more 
than 4. As we mentioned, many other models exhibit the same type of RSB and this is 
believed to be quite universal for the system without inversion symmetry. In this class of 
systems, one level of Parisi's RSB scheme has been shown to be sufficient to describe the 
stable and metastable states KPJ and we concentrate on this level of description in this 



paper. Then, n replicas are divided into n/m groups, each of which has size m and the 
matrix element d a p is 1 if a and (3 (a ^ (3) belong to the same group and otherwise. 

C. Free energy 

We just give an overview of the derivation and the expression for the variational free 



energy F var defined in eq.(|T^) here. Detailed expressions may be found in the Appendix for 
completeness since these are not important to understand the present arguments. Physically, 
the free energy F var consists of three parts, a conformational entropy term —nTS, a one- 
replica part (Hi) which contains the coherent part of the interactions, which ultimately give 
a folding funnel as well as an effective homopolymer term, and the inter-replica term (H 2 ) 
which is responsible for the random interaction between monomers. We explain each of 
them. 

In order to carry out the variational procedure we start with the calculation of Z re f, 
which is the same as that of Sasai and Wolynes |16[]. More details can be found in We 
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first diagonalize d a p with respect to the replica index. Concentrating on each block of size 
m, we get two type of eigenmodes, a symmetric mode with the eigenvalue A + = (we call 
+ mode) and m — 1 degenerate asymmetric modes with the eigenvalue A_ = 2m (— mode). 
With the diagonalized replica index /i, we see that the integrand is just a Gaussian function 
of if. The exponent is of the form AJ2^ij r i^ij r j, where the coefficient matrix W 1 {p, = ±) 

2coshA±-l -1 ••• ^ 

-1 2coshA± -1 ••• • 
-1 2coshA± -1 ••• • 

■ ■ ■ —1 2coshA ± —1 
•••0 -1 2coshA±-l y 
where A± is defined by 2coshA± = 2 + (B + C + A±D)/A. Thus, it is straightforward, 
although complicated, to integrate over configuration space and the result is represented 
in terms of Gfj, the inverse matrix of H ± . F re f is given as a function of B, C, D, and m, 
explicit formula of which is given in the Appendix. 
The conformational entropy S is expressed by 

nTS = -F iel + (H rei )-(H ), (17) 

First, F ref is simply obtained as (3F rc{ = — lnZ ref . Second, (i? re f) can be evaluated from 
the scaling argument. If we scale as — > y/zr'^ the exponent of the integrand changes 
/3H rc {(ri) — * z/5iJ re f(r9 because i? re f is a homogeneous quadratic function of iv Thus, 
taking a derivative of In Z re f written in terms of y[ with respect to z we get an expression for 
(if re f). Finally, (H ) is simply given by (H ) = —kBTA(d\nZ rci /dA). The conformational 
entropy S expressed thus in terms of order parameters B, C, D, and m is explicitly written 
in the Appendix. 

For the estimate of (Hi), we introduce an additional approximation in the spirit of the 
mean field theory. Defining the monomer density p a (r) as p a (j) = X^5(r — rf), replacing 



is 



n ± = 



12 



expectation value of products by the products of expectation values (mean-field approxima- 
tion), we get 



&o(l - (q*)) - ^(1 - (q a )) 2 



(p a (r)) 2 dr 



+ cj^(l~( qa ))j(p a (r)) 3 dr. 
In the same way, introducing the overlap order parameter function 



(18) 
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we can express H 2 as 



(H 2 ) 



(3b 2 v 2 



53(1 - (g a ))(l - (<fc)) / / rfr lC /r 2 (g Q/3 ( ri ,r 2 )) 2 . 



(19) 



(p a ), (q), and (Qa/?) can be calculated by direct integration and are expressed in terms of 
B, C, D, and m, which are given in the Appendix. We note that it is possible to calculate 
< H\ > and < H 2 > without introducing these approximations; the result becomes more 
complex but does not change the argument discussed in this paper. Therefore, we employ 
this approximation to get simpler expressions keeping the qualitative results unchanged. It 
is also advantageous to use these approximations in that it makes it easy to compare our 
results with those of Shakhnovich and Gutin ITS . 



III. COIL, GLOBULE, GLASS AND FOLDED PHASES 

Although the above results are quite general, it is hard to grasp the physical picture 
directly from them without any numerical work. Therefore, we introduce several other 
approximations to get simple analytical expressions for free energy. We take a sort of self- 
consistent strategy in the following way. First, we assume for each phase that one specific 
order parameter (or A) is much larger than the others. Second, using this inequality, we 
obtain an asymptotic expression for the free energy and seek the stationary solution with 
respect to order parameters for each phase. Finally, we confirm that the solution indeed 
satisfies the inequality we assumed. 
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The first approximation introduced is that N ^> 1 and most of non-extensive terms are 
ignored. This may actually be a severe approximation for practical work since proteins are 
mesoscopic and possess a considerable surface area. Next, we employ the simplest description 
of monomer density, the so-called volume approximation |32| , |33| ; (p(r)) is a positive constant 
p inside polymer and is zero outside. Thus, / p x (r)dr = Vp x = Np x_1 , where V is the total 
volume of the polymer and x is an integer. Thirdly, we approximate / < Q a p > 2 dvidv2 as 
(see Appendix) 

J Ql p dr 1 dr 2 ~ Np 



A J 

(Hereafter, we drop (•••), for simplicity), for the case a and (3 belonging to the same group 
of 1 level RSB. For the other cases, G~ { is replaced by gi = Gf { + (m — 1)G^ . The other 
approximations we use are dependent on the phase we consider and will be explained below 
one by one. 



A. Coil and globule phases 



The coil and globule phases may be characterized by the inequality A ^> B,C, 2mD. 
Assuming this inequality, we consider the radius of gyration defined by 



(20) 



and we get an asymptotic expression, 

1 1 



R z 



m — 1 



1 



m ^A(B + C) m ^JA(B + C + 2mD)_ 
Random-coil state should have the radius R ~ N l l 2 a. Combining it with the above equation 
we see that B+C and 2mD are at most of order N~ 2 A, which is consistent with the inequality 
we assumed. In the same way, the radius scales as R ~ A 1//3 a in the globule phase, which 
leads us to the estimate B + C ~ A _4//3 A and 2mD ~ N~ A /' i A. Approximating that the 
polymer is roughly spherical with radius R, p can be related to R by 

N 



P 



(4/3)7Ti? 3 ' 
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An estimate of the free energy is quite straightforward. First, starting from the full ex- 
pression given in the Appendix, we can derive an asymptotic expression for the entropic 
part, 



-TS = -Nk B T 
4 



1 B + C m-1 B + C + 2mD 
+ 



my A m V A 

which is of order 0(N°) for the coil state and is O^N 1 ^ 3 ) for the globule state. Secondly, 
the inter-replica term H 2 is of order 0{N" 1 / 2 ) for the coil state and 0(N°) for the globule 
state and thus is negligible. 

For convenience, we change the independent variables from B, C and 2mD to p, q, and 
2mD. Then, we can easily optimize 2mD and q to get the solution 2mD = q = (under 
some conditions discussed below), which leads to 

R 2 3 1 /3iV\ 2/3 
4 y/A{B + C) Wp) 

and 

-TS = l Nk BT^^ oc N l ' 3 k B Tp 2 /\ 

The latter is of order 0(N°) for the coil state and 0(N 1 ^ 3 ) for the globule state. 
Thus, the free energy can be represented only as a function of p as 

Fee = N V - (bo ~^PjP + Ncjp 2 + N^ PP ^k B T/A, (21) 

where [• • -] av in the LHS is dropped for simplicity and p is a constant of order unity, the value 
itself (9/16)(47r/3) 2 / 3 is not important for the present purpose. The first two terms are of the 
form of virial expansion with coefficients bo — (3b 2 /2 and c; the random interaction induces 
an effective attraction proportional to l/T. The third term comes from entropy loss due to 
packing. Although the latter is non-extensive and is not important for many situations, we 
retain it because it will play important roles for some cases as will be explained below. 



B. Glass phase 



Since the glass phase is characterized by a small thermal fluctuations around individual 
minima, we assume 2mD ^> A~^> B ,C '. Using this relation we can straightforwardly obtain 
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the asymptotic expression for the entropic contribution to the free energy as jjj 

In 



-TS = — — -iV/c^T 
m 



^2mD\ 3/2 3 



A J 



This can be interpreted confinement entropy. 

Next, let us consider the random interaction part H 2 . behaves as A/(2mD) in 
the present limit and we have / Q 2 i/3 dvidv2 — Np(4:ii)~ 3 ^ 2 (2mD) 3 ^ 2 , for the case a and (3 
belong to the same group of RSB and ~ otherwise. Here, we have to take care of the 
finiteness of spatial resolution as mentioned before. The above estimate holds only when 
l r i — r 2 | ~ G^i I A is of order v 1 ^ 3 or larger. Otherwise, Q a p should be replaced by the 
delta function with 5(0) = v" 1 , which gives J Q 2 l/3 dr 1 dr 2 — Npv -1 . To make the expression 
continuous with respect to the order parameter D, we switch two expressions when both 
take the same value. In summary, 

(4n)- 3 / 2 Np(2mD) 3 / 2 if (2mD) 3 / 2 < (4ir) 3 / 2 /v 
Npv^ 1 otherwise. 
In the same way as above, we change independent variables from B, C, D, and m to p, 
q, 2mD, and m. We can show that q = is stable unless the stability gap is too large and 
thus we can write down the free energy expression, 



J Q 2 ap dr 1 dr 2 



m 



Pb 2 \ Ar v 2 2 Ar /36V, 1N (^)- 3 l 2 p(2mD) 3 / 2 
1 i + Nc—p 2 - N ' ■ 1 1 * 
6 

/2mm 3/2 3" 



^Giass = N V -[b Q -^-\p + Nc-p 2 - N^-(m - 1 



pv 1 



In 



(22) 



V A J 2_ 

where upper (lower) terms is taken when (2mD) 3 / 2 is smaller (larger) than (Att) 3 ^ 2 /v. The 
first two terms are those of virial expansion as above, the third term represents inter-replica 
interaction and is the driving force for the RSB, and the last term is obviously entropic. 



C. Folded phase 



The folded phase is characterized by a large C, i.e., small fluctuations around an ideal 
native structure and so we assume C ^> A, B, 2mD. We can easily obtain an asymptotic 
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expression for the entropic part, as was done in @; -TS ~ (3/2)Nk B T In g. The inter- 
replica part has the replica-symmetric contribution, / Q 2 a adridY2 = (2tc)~ 3 ^ 2 N pC 3 ^ 2 for 



any pair of a and (3. Nativeness q in this limit is obtained as q ~ u s -r 
change independent variables from £?, C, and .D to p, g, and 2mD. 
We can show that 2mD = is the stable solution and thus 



3/2 



. Using this we 
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N- 



bo(l ~ q) ~ ^-(1 - g) 2 



p + iVc (l-g)p 2 + g£ T 

o 



+ iVfc B T In 



7T\ 3/2 ? 

l) v 



+ r^ 2 -m q(l _ qf 



(23) 



where q ~ 1. The first two terms, as is usual, have the form of a virial expansion, the third 
and fourth terms represent the enthalpy and entropy change due to folding, respectively. The 
last term, coming from inter-replica interaction, tends to cancel out the effective attraction 
due to the randomness appeared in the first term because protein does not feel randomness 
when it precisely coincides with the native structure. 



In summary, eqs.(|2T|), (p2j), and fl23| ) are the expressions for the free energy for all relevant 
phases, which will be used in the next sections. 



IV. PHASE TRANSITIONS AND FREE ENERGY LANDSCAPE 

Since we have obtained simple enough expressions for the free energies of several phases, 
we can now discuss the 'phase transitions' for finite systems. Our emphasis is on description 
of ruggedness of the free energy landscape. It has been argued repeatedly that there are a 
number of minima in the glass phase. We emphasize here, however, that even above the 
(static) glass transition temperature the appropriate free energy landscape has many minima 
which affect folding kinetics drastically. 



A. Coil-globule transition (collapse) 



We first discuss the coil-globule phase transition based on the free energy expression 
eq. (|2lD as a function of density p, (p > 0). First of all, we ignore the third term, which is 
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smaller in N. Then, the lowest free energy is attained at the p* = when b e s = 60 — (3b 2 / 2 > 
0, while it becomes positive, 

when b cS < 0. Thus, the phase transition temperature T CG is determined by 

, PcGb 2 , , 

bo — = 0, (25) 

where (3cg — 1/(^b^~cg)- The third term in the free energy ( pTj) does not change this 
temperature significantly for sufficiently large polymer. There are two cases, however, where 
the third term play roles. First, for a short polymer at high temperature, the third term 
becomes dominant; this term makes the globule state unstable and so the random-coil phase 
always appears in the limit of high temperature. Second, in the vicinity of p = 0, the third 
term is the largest and thus dF/dp\ p= o is positive infinite and so the transition is first order 
with very small barrier 0(A^ _1 ). We should mention that extending the argument to include 
non-uniform description of polymer leads us to a surface term of order 0(N 2 ^ 3 ) ||32j| , which 



is not taken into account here. The third term here is 0(N 1 ^ 3 ), which is smaller than the 
surface term and so the reasoning leading to first order phase transition given here might 
not be appropriate. In any case, coil-globule transition is a first order phase transition with 
very small barrier and because of this it might be recognized as a second order like transition 
by numerical simulations, or in the laboratory. 



B. Globule-glass transition 

Next, we discuss the glass transition. First of all, we fix p at p* given in eq.([24|) ||15|| . 
Roughly, p* should not change significantly after the collapse although, rigorously speaking, 
p should be optimized simultaneously with the other order parameters. Here, we should 
remember that our starting point was based on the virial expansion which is not very 
accurate in any collapsed phase. Thus we feel the virial approach will overemphasize density 
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variations p* in the virial approximation changes too rapidly with the other thermodynamic 
parameters that would be the case for a more accurate homopolymer equation of states. 
Therefore, it is better to fix p* by choosing c appropriately at this level of description. In 
other words, we change from the independent parameter c to p*. Qualitative features do 
not change very much by this prescription. Again this will be a most accurate description 
when strong collapse is favored by the homopolymeric part of the pair interactions. 

We seek the saddle solutions of FG\a, ss (fn, X) (i.e., eq.fl22|)) with respect to m and X = 
(2mD) 3 ^ 2 . First, let us minimize .FGiass/(^ — 1) with respect to X. Forgetting the first two 
terms which are constant in X, we have two relevant terms which have opposite effects. The 
third term, the driving force to stabilize the replica symmetry breaking solution, tends to 
push X to its maximum value X max = (AttY^v^ 1 . See FigfJ, in which a dotted line with 
'TlnX' corresponds to the third term. On the other hand, the fourth term, the entropy loss 
due to freezing, prefers small X (another dotted line with '— (3X' in Fig.[|). At sufficiently 
high temperature, the fourth term which is proportional to T always dominates and so 
X = is the only stable solution, as is illustrated in the figure. As decreasing temperature, 
the third term which is inverse proportional to T becomes important at large X and, in 
addition to the solution X = 0, a new solution X = X max becomes locally stable when 



dX 
which gives 

(3 2 b 2 p*vm 



xx = °' ( 26 ) 



(27) 



4 

We call this critical temperature Ta following p3| . For structural and Potts spin glasses 
this is the transition temperature predicted by mode coupling theory ||21|| . Below Ta, there 
are always two locally stable solutions X = and X = A max , a melted phase and frozen 
phase, respectively. We should mention that eq.(p2|) is derived under the assumption that 
D ^> A, B, C, which does not hold true for the solution X = (i.e., D — 0). Thus, we have 
to use the free energy expression for the coil-globule phase keeping a small dependence on 
D. We find X = is indeed a stable solution, at the end. 
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Next, at T < Ta, we optimize 

1 / Bb 2 \ 1 777 — 1 

F(m, X max ) = -Np*v [b - V - -iVp*^6 2 (m - 1) + iV7c B T (lnj/ 7 - 3/2) (28) 

4 y 2 y 4 777 

with respect to m, where p' = (8tt) 3//2 and 7 = a 3 /t>. (7 represents flexibility of the chain and 



is about 5 for very flexible chain like protein f32| , |3"5|| . The value of p' depends to some extent 



on the approximations we use and thus we think its precise value is somewhat uncertain. 
Qualitative results are not affected by its value as long as it is of order unity. In discussing 
lattice model results we therefore treat it as adjustable.) In the same way as above, the 
second and third terms lead to effects in opposite directions (see Fig.|^). The stationarity 
condition, 

<9-^Glass( r », ^max) _ g t^gS 

dm 

leads us to m* = ^^(lnp'7 — 3/2/p*f) 1 / 2 . Inserting this into eq.(|27|) we get 



fc B T A = -yyu(lnp'7-3/2). (30) 

At T = Ta, m* A = lnp'7 — 3/2 is larger than unity and so in the ordinary replica formalism, 
this solution has been ignored for the reason that it does not contribute to the Boltzmann 
average. Physically this mean the configurational entropy of these local free energy minima 
is extensive at Ta- Recently, Kurchan, Parisi and Virasoro |29|] interpreted this solution 



as yielding the metastable states in the case of p-spin spherical model. We follow their 
argument and allow m to be larger than unity. The m* decreases in linear with T and 
coincides with unity at T = Tk defined by 



Below this temperature, this frozen solution becomes dominant in the Boltzmann average. 
The Kauzmann temperature Tk corresponds to the case where the configurational entropy 
of basins reaches zero. Eq.fl3~T|) is the same as that of Shakhnovich and Gutin |T5[ except that 



the estimates of the entropy loss, ^(lnp^— 3/2) here, are not the same. Tk is proportional to 
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the randomness, b, and is inversely proportional to the square root of the entropy loss, which 
is the same dependence found by Bryngelson and Wolynes using a statistical field version 
of Flory theory ||. Moreover, Tk is proportional to square root of p*v which represents 
packing fraction. This dependence is found in ||13|| . (At first glance, one may notice the 
difference by a factor \/2 between the present result and that of Refs. [§,0. This is simply 
because of the difference in the definition of randomness as will be discussed later.) 
Let us estimate the free energy at the saddle solutions. For the solution with X = 0, 

lobule = N V - (b - Sf\ p\ (32) 



4 V 2 



while for the solution with X = X max , 



- Nb^p*v(\np'-f - 3/2) + Nk B T(\np'-f - 3/2) + ^N(3b 2 p*v. (33) 
The energy difference between them has the simple form, 

^Glass - Globule = ^7 - 3/2)Nk B T K {£- + ^ - 2) > 0, (34) 

which touches zero at T = Tk. It should be noted that in the replica formalism the solution 
with the larger free energy dominates the Boltzmann average when m* < 1 as is known 
||, the solution with lower energy becomes dominant when m* > 1. Therefore, -FG lobule 
dominates the thermal average above Tk, while -FQ lass becomes dominant below Tk. 

Following Kurchan, Parisi, and Virasoro, we can estimate a lower bound for the free 
energy of transition states (TS) between the lowest local minima in the temperature range 
Ta > T > T K . We conjecture that the behavior of the RSB in our model is analogous to 
that of p-spin spherical model and so the TS solution is again represented by 1 level RSB 
in the temperature range considering now. Then, the TS solution (m*,X*) may be assigned 
to the one which makes F/(m — 1) maximum with respect to X (X* in Fig||). The saddle 
condition gives 

2 In — - In + m* A - m t = (35) 

Ta m* A 
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and 

xS = 4(^(4^ 

The parameter > 1 indicates these are configurational entropy driven transitions. In 
general the parameter m < 1 is conjugate to the nonextensive complexity of states below 
the thermal transition while here m* > 1 presumably represents the fact that multiple escape 
routes are possible from a trapped state. The upper equation cannot be solved analytically 
in its general form. By the Taylor expansion around Ta (T < Ta) we get 



m\ — 1 (T — T, 



AT* es F<L S - Fa iass = Nk B T A ^-± —^-^ ) + O 



2 



T-T, 



A 



Ta 



(37) 



m A + lV T A 

which clearly shows that barrier heights grow up as decreasing temperature starting from 
zero at T = Ta. Obviously, this temperature dependent barrier height will give a non- 
Arrhenius behavior in the kinetics, as is well-known in the structural glass physics. Notice 
that this behavior is consistent with Ta being a sort of spinodal for the random minima. 

We also show numerical estimate of the barrier heights at Ta > T > Tk in Fig.^. We 
solve eq.(|3"oD numerically for rrr and put it, together with eq.(^Bj), into eq.(2~2). Here, barrier 



heights grow near Ta as was shown above and then start decreasing. The latter is because 
in eq.(gg) decreases with temperature decreasing and the approximation 2mD ^> A, B,C 
becomes worse. It is expected that real barrier heights grow monotonically. The value of 
barrier height depends on the non-universal number p' and may not be accurate; The naive 
choice of p' = (87r) 3 / 2 gives AT* ~ I.OA^/c^Tk at around Tk, which is three times higher than 
that estimated from 27-mer lattice model by simulation |36 |. A smaller value p' = (87r) 3 / 2 /5 



gives a comparable barrier height to the simulation ~ 0.4A/cbTk. We should also note that 
the barrier height is always proportional to the size of polymer N in the present description, 
which might be appropriate for relatively small proteins but is probably not accurate for 
larger ones where inhomogeneous saddle points may dominate. This may also be the reason 
the naive estimate gives a larger barrier than the simulation. We will touch upon the latter 
case in the Discussions. 

22 



C. Globule- folded transition (fast folding) 



To consider the folding transition we need a free energy expression applicable in the 
whole range < q < 1. As discussed above, the entropic term in the globule phase is small 
and is negligible as the lowest approximation. Thus we simply interpolate the entropic term 
between two regimes, i.e., q ~ and q ~ 1. Thus one uses a simple form: P7| , 

2 

+ Nk B T\n(p" iq +l), (38) 



FcGF(q) = N~ 



Ml " Q) ~ ^-(1 " if 



V " - 2 , „77>T 



p + Nc— (1 -q)p 2 + qE 
o 



where p" = (2n) 3 / 2 , the value of which should not be taken as very precise. We again fix p 
at p* given by eq.pi) by choosing c appropriately. Typical free energy curves along order 



parameter q are drawn in Fig.[|, which clearly shows that the folding transition is the first 
order. Thus, the globule-folded phase transition is defined by the relation, 



CGF 



(0)=F CGF (1), (39) 



which now gives 



\p*v(3 ¥ b 2 = -\5e T \ + k B T F lnp"j, (40) 



where T F is the folding temperature, (3? = 1/(/cbT f ). We also defined the energy gap (per 
monomer) 5e T between the native energy and the average energy of collapsed states by 

Se T = E T /N - p*vb /4, (41) 

since the energy gap Se T is more useful parameter than E T to represent the bias towards 
folding [38]. The LHS of eq.(f40"D is the free energy of the globule, the first term in the RHS 



is the energy in the native state, and the second term is the entropy loss due to the folding. 

The critical situation at which free energy barrier for the folding transition disappears is 
determined by the relation 

dFcGF 



dq 



= 0, (42) 

q=0 V ' 
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which now leads to |<5e T | = k B T^p"^ + |p*u/3d& 2 - Here T D is the critical temperature below 
which downhill folding occurs and (3d = l/(k B T-o). 

Above this critical temperature T B , folding takes place over a free energy barrier. To 
think about it, we rewrite g-dependent part of free energy as 

Fcgf/N = (<5e T - ^p*vpb 2 ^j q - ^p*v[3b 2 (l - q) 3 + k B T\n(p" iq + 1) + const. 

The first term, mainly representing the enthalpy change due to the folding, is linearly de- 
creasing with respect to q. On the other hand, both the second term, the effective attraction 
due to the randomness, and the third term representing the entropy loss through the fold- 
ing are monotonically increasing function of q. Therefore, the physical origin of the barrier 
for the folding is partly the reduction of randomness upon folding and partly the entropy 
loss; Depending on the values of parameters, either one can be dominant. Because of this 
complication, the estimate of barrier height becomes complicated too. When the barrier is 
small, we can write it as 

(5e T + 5p*v(3b 2 /8 + K B Tp"jY 

AF X ~ N- — 

Spvfib 2 + 2k B T(p"-f) 2 

Note that in this analysis the barrier height is always proportional to N, as is in the case of 
the barrier between lowest misfolded states discussed before. 

D. Glass- folded transition 

First, we fix p at p* by eliminating c through eq.(^4j), as usual. The first order phase 
transition between the glass phase and the folded phase is defined by 

*Sta- = *cgf(1), (43) 

which now becomes 



- |<5e T | + k B Tp ln/7 = ~p*v&b 2 - bJp^lnp'-f - 3/2) + fc B T^(lnp' 7 - 3/2), (44) 

o 

where Tp is the folding temperature from the glass phase and (3' ¥ = l/(k B Tp). 
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V. DISCUSSIONS 



A. Phase diagram 

Taking results in the previous section together we can draw a few phase diagrams which 
are given in Figs.|| The present model includes 4 independent parameters of interest, the 
mean value of contact energy bo, variance of a contact interaction b, the energy gap between 



the native energy and the mean energy of collapsed states \Se T \ (defined in eq.(41) ), and 
temperature T and so we have no choice but to draw a few surface of sections of the complete 
four dimensional phase diagram. As was mentioned before, we are not taking some numerical 
factors p, p', and p" very literally at this level of description and we use the values deduced 
from lattice model, as will be explained below. A more sophisticated treatment is required 
to decide these values without the use of simulation data. We also note that the qualitative 
features of the phase diagram do not change with the choice of these parameters so long as 
they are of order unity. 

Figure ^a shows a surface of section on b— \Se T \ plane, which is the same representation as 
that of Bryngelson and Wolynes M . In the figure, there are three phases (bounded by solid 
curves) , the globule phase denoted by M, the glass phase denoted by G and the folded phase 
denoted by F. Roughly speaking, the phase diagram at this level is essentially the same as 
that of Bryngelson and Wolynes. Now, we can go forward to put in more information on 
the kinetics. First, the globule phase is separated into two regimes, Mi regime where the 
free energy landscape is monotonous and no local minimum except globule state exists and 
M2 regime where there are many local minima, each of which is separated by a barrier of 
order N though the formal Boltzmann average is dominated by the globule state. Next, the 
folded phase can be separated into three parts. These are the Fi regime where protein must 
pass over a free energy barrier to fold but is not trapped by the frozen state, the F 2 regime 
where the protein does not experience an activation barrier for the folding transition and 
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fast downhill folding occurs, and the F3 regime where before reaching at the folded state, the 
protein can be found in the glassy state and thus corresponds to a slow folding process with 
intermediate. In the recent synthesis by Bryngelson et al [Q], several scenarios of folding 
were classified. The Type scenario there corresponds to the F2 regime here, the Type I 
scenario takes place in the F\ regime, and the Type II scenario roughly corresponds to the 
F% regime. For the latter case, Bryngelson et al discussed the case where glass transition 
takes place at the middle of folding process assuming that glass transition temperature Tk(g) 
increases as a function of q. This seems to be the case in the lattice models studied ||39|| . 
The present analysis, however, suggests the possibility of the opposite situation, i.e., the 
glass transition temperature decreases as a function of q and thus there can be a case where 
the pre-folding state is glassy, while folded state is non-glassy. There is indeed a subtle issue 
whether the glass transition temperature increases or decreases as a function of q because 
both the ruggedness AE 2 and the residual entropy S decrease with respect to q. The latter 
quantity depends on the variational approximation used. 

Figure ^|b shows another surface of section on the ksT — b plane (when 6 is negative). 
This representation corresponds to that of Sasai and Wolynes [|16| . The same notation 
as above is used to describe each phase/regime. It is well-known that the temperature 
dependence of folding is not simple to discuss in laboratory studies of proteins because 
every parameter, in principle, depends on temperature through the entropic contribution to 
the hydrophobic force. A similar problem occurs when comparing the phase diagram for the 
virial expansion Hamiltonian to a lattice model with rigid excluded volume. In order not to 
make the argument ambiguous we first ignore any dependence of the parameters bo, b, and 
|<5e T | on temperature. The phase diagram looks similar to that found by Sasai and Wolynes. 
One outstanding difference is that, in the present diagram, we have no random-coil phase, 
which appeared in Sasai and Wolynes. One reason is that we have ignored the entropic term 
to locate the coil-globule transition. As was mentioned, the entropic term always creates 
the coil phase in the high temperature limit. The other reason is related to the difference in 
the model itself; We assumed that all parameters are independent of T, for clarity and this 
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is why we have no coil phase in this surface of section. 

To be more realistic, we next consider the temperature dependence of the average virial 
coefficient bo, which leads us to the random-coil phase in this representation. To see this, 
we employ the temperature dependence of 60 as 

b = 1 -^-'lk B T (45) 



following Grosberg and Khokhlov [p2| . Here, 9 is the so-called theta-temperature where 60 
becomes zero. The phase diagrams for this model are given in Figs.[7] for two different values 
of 9. This has the random-coil phase denoted by C as is expected and is closer to that of 



Sasai and Wolynes |L6| as well as that of Bryngelson et al |J. For a better solvent (Fig.[7|b), 



the coil phase becomes more stable and thus the coil-globule transition curve goes down. 
Depending on the nature of solvent, a direct transition from the coil phase to the folded 
phase may also occur. 

We now comment upon relations to other phase diagrams given in literature. 
Shakhnovich and Gutin |T5] showed a phase diagram on b — b plane for the random het- 



eropolymer. Restricting C = 0, we can compare two results quantitatively. l)The coil- 
globule transition curve is exactly the same. 2) Comparing eq. (pip with eq.(26) of |T5j we 



see that the only difference arises numerical factors which are not very exact in either anal- 



ysis. Next, we comment on the phase diagram given in Ramanathan and Shakhnovich |L7 . 
Roughly speaking, the selective temperature there plays similar role to 6/|<5e T | here. Thus, 



interchanging the vertical and horizontal axes, we see that Fig.|6|b and Fig.l of [0 look 
very similar. Socci and Onuchic drew a phase diagram based on the lattice MC simula- 
tion. Unfortunately, we cannot compare directly with their results because they fix the 
sequence while changing interactions between monomers, thus both bo and |<5e T | depend on 
interactions simultaneously. 
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B. Free energy landscape 



We can get some insight on the ruggedness of free energy landscape based on the analogy 
between the Potts type spin glass and the present model. Figure [l| represents schematically 
the TAP free energy in the Potts-type spin system. This can also be viewed as a TAP 
free energy landscape of the random heteropolymer, although we do not present here any 
explicit form of TAP free energy; We can define 'pure states' if the potential surface has 
many minima, each of which is separated from the others by an infinite barrier in the 
thermodynamic limit. An individual pure state s can be identified with the expectation 
value {ff} of monomers averaged for a particular local minimum. 

Referring Fig.|l] we discuss characteristics of the free energy landscape for each temper- 
ature range. l)At any temperature T above Ta, the landscape is monotonous and there is 
only one trivial solution in TAP equation. In the replica formalism, this is represented by 
the replica-symmetric solution (the solution with D = 0), which corresponds to the globule 
state physically. 2)At the temperature between Ta and Tk, there are both replica-symmetric 
and RSB solutions. The latter energy coincides with the lowest TAP free energy. To account 
for the formal Boltzmann average we sum over many TAP solutions as 

Z = J dF TAP exp [In ^(T TAP ) - (3F TAP ] (46) 

where to is the density of TAP solutions. In this temperature range, the exponent here has the 
stationary point F* above the lowest TAP energy. Therefore, a number of TAP solutions 
contribute to the Boltzmann average and, due to this degeneracy (complexity), the free 
energy F = — l//31nZ, which coincides with the replica-symmetric free energy T^g, becomes 
lower than the lowest TAP energy (= -Frsb)- For finite systems such as actual proteins, the 
infinitely long time behavior can be represented by the replica-symmetric solution, i.e., 
globule state, while the protein nevertheless feels a large barrier to move between globally 
different states and finite time dynamics may be controlled by the metastable RSB solution, 
i.e., glassy state. Activated transport among many TAP minima takes place. 3)Below T K , 
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the stationary point F* in the exponent of eq.(f46|) disappears and a few lowest minima in the 
free energy landscape dominate the Boltzmann average. Since the lowest TAP free energy 
still coincides with the RSB free energy, the glassy state becomes globally stable. 

Next, we discuss the physical interpretations of Ta and Tk. Tk is the temperature 
where the entropy becomes zero and sometimes called 'entropy crisis'. The average num- 
ber of contacts can be estimated as p*v/2 and variance of random energy distribution per 
monomer becomes Ae 2 ~ p*vb 2 /2, while the entropy loss per monomer s\ oss through freezing 
is fcB(lnp'7 — 3/2). Therefore, Tk is expressed as 



k B T K 



(Ae) 2 



\| 2si oss /k B 1 

which is consistent with the analysis of the REM |]|§. Note that the entropy factor is 
in the denominator; Increasing the flexibility per Kuhn segment 7 ||35|| , the entropy to be 



lost for freezing increases and then the Kauzmann temperature Tk becomes lower. On the 



other hand, since Ta is given by UbTa = |yp* , y(lnp / 7 — 3/2), increasing the flexibility 7, it 
becomes easier to make multiple minima in the free energy landscape and thus Ta increases. 
As a result, a polymer with large 7 possesses a relatively wide temperature range between 
Ta and Tk. 

In the previous section, we have shown that barrier heights between two lowest minima 
in the free energy landscape increase with decreasing temperature below T&_. This directly 
leads to the super- Arrhenius temperature behavior, for example, in the diffusion constant in 
this temperature regime. On the other hand, from the analogy to the REM |Q, we expect 
that barrier heights saturate at Tk, below which the diffusion constant recovers Arrhenius 
temperature dependence. To deal with the temperature range below Tk explicitly, following 



Ref. | 29| , we need to employ 2 level RSB, which is straightforward but somewhat more 



elaborate. 
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C. Mapping onto 27-mer lattice model of protein 



We can try to map the present model onto the three-letter code 27-mer lattice model 



studied exhaustively by MC simulations p9| , |36[| although the present model is not a lattice- 
type one. While this mapping must be ambiguous to some extent, it is helpful in un- 
derstanding the simulation results. In the lattice model |3T||3T)|| , l)the average energy of 
globule state is —50, which corresponds to pvbo/4 <-> —50/27 = —1.8. 2) The energy of the 
native structure is —84 and thus subtracting the average energy, we get the stability gap 
per monomer, \Se T \ «-> (84 — 50)/27 = 1.3. 3)The entropy loss for freezing to an unique 
structure from a free chain should be roughly In 5 for the cubic lattice giving the estimate 
fcs(lnp'7 — 3/2) <-> 1.6. 4)On the other hand, the entropy loss for globule-folded transi- 
tion was estimated as 20ks from simulation of 27-mer at the folding temperature giving a 
different value lnp"7 <-> 20/27 = 0.74. 5)The measured energy fluctuation at the folding 
temperature is 51 and so pvb 2 /2 «-> 51/27 = 1.88. From these mappings, we can get the 
dynamic glass temperature, the static glass temperature, and the folding temperature to be 
Ta = 1.23, Tk = 0.77, and Tp = 1.25, respectively. On the other hand, there is no real 
solution for Td; Apparently a strictly downhill scenario folding (spinodal folding) cannot be 
reached with these parameters. 

These estimates are very crude and tentative, but it still may be interesting to discuss 
the folding scenario based on it. From this estimate Ta and Tp are very close to each other 
and thus, usually folding occurs below Tp, so the protein feels multiple minima along its 
folding route. Therefore, the most probable scenario is that, after hopping among many local 
minima, protein finds its native structure which is stable thermodynamically. On the other 
hand, because the estimate are uncertain, one should consider the possibility that folding 
may occur in a regime where chain dynamics is not far from Rouse dynamics renormalized 



by mode coupling effects [40 



We should give a warning that changes in the mapping procedure may change this 
assignment of the scenario to some extent. Folding is expected to occur at temperatures 
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somewhat above Tk and thus the characteristics of free energy landscape in this regime 
are of most interest. More exhaustive study of off-lattice simulation seems to be desirable 
to study details of free energy landscape and kinetics. The present analytical calculation 
should be tested more carefully by off-lattice simulation. 

Very recently, Shakhnovich and co-workers have carried out lattice simulations that sug- 



gest random heteropolymer dynamics is not activated at high temperature ||41|| , adducing 
a polynomial dependence of the time scale on system size. This would be consistent with 
the expectation of a mode coupling analysis about Ta, where polynomial divergences with 
chain length are expected ||42|| . At low temperatures activated dependence is seen, however. 
A more exhaustive version of such studies may help in quantifying Ta versus Tk- 



D. Folding kinetics: Comments on the effects of inhomogeneity 

In this paper we concentrated on the case where all order parameters B, C, and D are 
independent of i; We have restricted our description to a homogeneously ordered or trapped 
polymer. In particular, because of this the two types of barrier heights discussed in this 
paper are proportional to N, which may not be appropriate for relatively large protein. For 
the latter, many inhomogeneous states may play important roles, especially to describe the 
folding kinetics. First, a folding nucleus can be represented as a state where part of the chain 
has much larger Cj than the other part. Nucleation [^-|46| would naturally be followed by 
the growth of the number of monomers having larger Cj. The question addressed by such a 
analysis would be the size dependence of the free energy barrier for the folding transition. 
Secondly, for quite a large polymer, a specific separation of Cj into two values may create a 
(meta)stable state. This might be related to a foldon, a small quasi-independent folding unit 
||47|| . Another possibility of inhomogeneous states is a locally trapped state, which means 
only part of chain has large Di, while the others have Di ~ 0. This might be a transition 
state between totally frozen states and that between a frozen state and melted state. We 
can extend our treatment to these inhomogeneous cases with the trial Hamiltonian, 
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PH rei = AY / (r: +1 -r?) 2 + Y,B l (r?) 2 + Y,^?-rT) 2 + £ A<M*? - rf ) 2 . (47) 

Detailed results for these problems are under study and will be reported elsewhere. 

These inhomogeneous description of polymer require more elaborated modes of replica 
symmetry breaking. For example, in the case of p-spin spherical model, 1 level RSB is 
enough to represent stable states at temperature below Tk, while at the same temperature 



range 2 level RSB describe transition states between two lowest minima ||29|| . Correlation of 
the free energy landscape has been modeled in terms of generalized REM, which includes 
continuous part of Parisi's order parameter and is a step to this direction |p~3f . 

We note here that an inhomogeneous version of the analysis here gives results analogous 
to Kirkpatrick and Wolynes's first calculation of barriers based on a simple interface between 



TAP solutions ||23|| . Indeed Parisi [|8| obtained the same dependence as KW on (T — Tk) 
i.e., AT 1 * ~ (T — Tk) -2 for Potts glasses in 3 dimensions. Later, Kirkpatrick, Thirumalai, 
and Wolynes showed how wetting of the interface between two TAP solutions in a droplet 
by other TAP solutions led to the more usual Vogel-Fulcher behavior AT* ~ (T — Tk) -1 
||21||b). This would apparently require a more complete spatially inhomogeneous RSB for 
the saddle point. We note that recently Thirumalai has argued, based on the KTW style 
argument, that barriers for traps should scale only as N 1 / 2 ||46|| . In our view further analysis 
is needed because this scaling argument should be only valid in the strict vicinity of Tk, not 
necessarily the high temperature relevant for folding of minimally frustrated proteins. 

As was noted, virial expansion used in this paper is not very appropriate to describe 
compact states. Incorporating rigid chain connectivity with hard core repulsion will over- 
come this disadvantage. In principle, Fixman's independent- oscillator comparison potential 
JIH might be applied to do this, although qualitative results discussed here are believed to 
be unchanged. The effects of the hardcore on Ta may be considerable, since even the homo- 
geneous hard sphere fluid possesses a mean field dynamical transition. Also the somewhat 
subtle questions such as change in the radius of gyration from molten-globule state to the 
folded state may be analyzed by this extension. 
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VI. CONCLUSIONS 



We have analyzed the free energy landscape of model protein based on the replica varia- 
tional method. The ruggedness of free energy landscape is manifested by two glass transition 
temperatures, Ta and Tk (T\ > Tk). 1) Above Ta, the landscape is monotonous. Dynamics 
is like that of a free Rouse chain modified by mode coupling effects. These effects would give 
dynamical freezing at Ta- 2)Between Ta and T K , the landscape has a number of metastable 
minima but the collection of them dominates the Boltzmann average as a whole. These 
are represented by the replica symmetric solution, while the RSB solution is metastable. 
3)Below T K , only a few lowest states contribute to the Boltzmann average and this is well- 
described by the RSB solution. In the second regime, the barrier between two lowest minima 
grows as decreasing temperature, which leads to the super-Arrhenius temperature depen- 
dence of diffusion constant. We believe that folding occurs somewhat above T K , which 
implies a mechanism of hopping between numerous local minima until finding native struc- 
ture through the guiding forces provided by minimal frustration and the concomitant folding 
funnel. We have also drawn a phase diagram having seven qualitatively different dynamical 
regimes. Several scenarios of folding were discussed based on this diagram although it may 
not be quantitatively accurate in all details. 
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APPENDIX A: EXPLICIT EXPRESSIONS 



Here we give explicit expressions for the variational free energy F var defined in eq.flloD in 
terms of parameters B, C, D and m that appear in the reference Hamiltonian. 

Since Z ml is a many dimensional Gaussian integral with exponent AJ2 ll ij r i7~£ij r j, we 
can execute the integration by calculating the inverse matrix of Ti^, 

f cosh(z - 1/2) A± cosh(iV - j + 1/2) A± 

^-y± I sinh N\± sinh \± J ( Ai) 

V ~ I cosh(j - 1/2)A± cosh(iV - i + 1/2)A± . > . { ' 

[ sinh AA± sinh A± * — J 

and the determinant, where A± is defined by 

sinh A± = ^l + AA/f ± , cosh A± = 1 + ||, (A2) 
where f± = B + C + A±.D and A + =0, A_ = 2m. Using these, we can write down Z re f as 

7T\MiV-l)/2 / sinhA X3n/ 2m , ^(m-lJ/Jm 



'ref 



A/ ysinhA^A+y \sinhATA_ 

.4 



xex P(^T-E tfCgrJ - nC E rP ) - (A3) 



As was explained in Sec.|I|, the conformational entropy S expressed as 

S/k B = -pF iet + (3(H ie i) - A£((r? +1 - if) 2 ), (A4) 

can be computed directly from Z re f and is written as 

3n /„ . 9 \ , sinhA + 3ra(m — 1) /„ , d \ , sinhA_ 
= 1 + A— In — i— H i - 1 + A— In ■ 



2m \ dA J sinh A^A + 2m \ OA J sinh A^A_ 

-n^rj{l-A^jG±rJ, (A5) 

where we dropped a trivial constant term which represents Gaussian free chain entropy. 

For the one- replica part, (Hi) is written in terms of (p a ) and (q) in the text. Thus, what 
we need to do here is to give expressions for the latter two quantities. The expressions for 
these just have an additional 5-function from the definition of Z re { and are easily computed 
to give 
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(pW) = E( ! f) ' exp -A(r-s,) 2 M 



-3/2 



where 



m 



G± + {m- 1)G; 



c 



s i = -^ E G u r Ti 



and 



v ^ ixgi 



-3/2 



cxp 



(A6) 

(A7) 
(A8) 

(A9) 



Finally, the inter-replica term (H 2 ) is written in terms of p a and (Qap)- The latter is 
computed as 



(Qap) 



■ 3/2 (*gi 



-3/2 



A 



exp 



/n + r 2 



E 



-3 



cxp 



(n - s^ 2 (r 2 - s^ 2 

9i 9i 



A 
2G- 



(ri - r 2 ) 5 



(A10) 



where upper (lower) line is for the case a and (3 belong to the same (different) group of RSB 
and Qi is defined by 



9i 



-Gt + 1 



m 



m 



Gu . 



In the limit of large N, expressions become considerably simple. For Gf, 



V) 



Gf< ~ 



exp(-|i - j\\±). 



(All) 



(A12) 



10 2sinhA± 

which depends only on the sequential distance \i — j\ between monomers. Therefore, G% is 
independent of i and so 



{Qaf3) 



'2nG- 



-3/2 



cxp 



A 



2G- 



(ri - r 2 f 



^ \ 2A 



-3/2 



exp 



2A (v 1 + r 2 

9i 



Pa 



rl + r 2 ^ ^2ttG^' 3/2 



cxp 



2G 



-(r 1 -r 2 f 



for the case a and f3 belonging to the same group of 1 level RSB, and 



(Qap) = Pa 



ri + r 2 \ firm 
A 



-3/2 



cxp 



(A13) 
(A14) 

(A15) 



otherwise. 
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FIGURES 

FIG. 1. Schematic view of the TAP free energy landscape with the Boltzmann distribution 
plots. a)Above T\, the free energy landscape is monotonous. b)At T\ > T > Tk, the free 
energy landscape has a number of minima and a collection of metastable states contribute to the 
Boltzmann average, which corresponds to the replica symmetric solution -Frs- c)Below Tk, the free 
energy landscape has a number of minima but only a few lowest states dominate the Boltzmann 
average, which is calculated by the RSB solution -Frsb- 

FIG. 2. The free energy as a function of X = (2mD) 3 / 2 . The dotted curve with 'TlnX' 
represents the third term of eq.(p2|) and the dotted curve with '— corresponds to the fourth 
term. Three solid curves represent sum of them for different temperatures. T = T\ is the critical 
temperature below which there is a minimum at X = X max . 

FIG. 3. The free energy as a function of m. The dotted curve with T(m — l)/m' represents 
the second term of eq.(^8|) and the dotted curve with t —j3{rn — 1)' corresponds to the third term. 
Three solid curves represent sum of them for different temperatures. T = Tk correspond to the 
critical temperature at which m* = 1. 

FIG. 4. The free energy barrier between two lowest minima as a function of temperature. 

FIG. 5. The free energy as a function of nativeness q for 4 different temperatures. 1)T > Tp, 
2)T = Tp, the folding transition temperature, 3) Tp > T > Tb where folding occurs through 
activation process, and 4)T = To below which downhill folding takes place. 
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FIG. 6. Phase diagrams derived from the present model. a)a diagram on b — |5e T | plane with 
energy unit kgT . b)a diagram on ksT — b plane with energy unit |<5e T |. Solid curves separate 
different (static) phases, dotted curves represent the boundaries at which metastable states dis- 
appear, and dashed curve shows the glass transition in the metastable unfolded state. Mi region 
is the molten globule phase with monotonous free energy landscape. M2 region still corresponds 
to the molten globule state, but rugged free energy landscape has many minima. G means the 
glassy phase where protein misfold to any one of the lowest states. F n with n = 1,2, and 3 are 
folded phase; Fi region has an energy barrier to folding but the system is not frozen, while there 
is no barrier in F2 region. F3 corresponds to the regime where pre-folding state is glassy and thus 
folding transition can be very slow. The values of parameters used are pv = 1, \np ,r y — 3/2 = 1.6, 
and lnp"7 = 0.74, which are deduced from mapping onto 27-mer lattice simulation. 

FIG. 7. Phase diagrams derived from the present model with temperature-dependent 60 defined 
in eq. (fi"5|) . (with energy unit |5e T |) a) For a poorer solvent UbO = 1.0|5e T | and b)a better solvent 
ksO = 0.5|<5e T |. Notation and values of parameters used are the same as those of Fig|| in addition 
to the random-coil phase, which we denote by C. 
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